Multi-language Speech Collection for NIST LRE

نویسندگان

  • Karen Jones
  • Stephanie Strassel
  • Kevin Walker
  • David Graff
  • Jonathan Wright
چکیده

The Multi-language Speech (MLS) Corpus supports NIST’s Language Recognition Evaluation series by providing new conversational telephone speech and broadcast narrowband data in 20 languages/dialects. The corpus was built with the intention of testing system performance in the matter of distinguishing closely related or confusable linguistic varieties, and careful manual auditing of collected data was an important aspect of this work. This paper lists the specific data requirements for the collection and provides both a commentary on the rationale for those requirements as well as an outline of the various steps taken to ensure all goals were met as specified. LDC conducted a large-scale recruitment effort involving the implementation of candidate assessment and interview techniques suitable for hiring a large contingent of telecommuting workers, and this recruitment effort is discussed in detail. We also describe the telephone and broadcast collection infrastructure and protocols, and provide details of the steps taken to pre-process collected data prior to auditing. Finally, annotation training, procedures and outcomes are presented in detail.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New resources for recognition of confusable linguistic varieties: the LRE11 corpus

The NIST 2011 Language Recognition Evaluation focuses on language pair discrimination for 24 languages/dialects, some of which may be considered mutually intelligible or closely related. The LRE11 evaluation required new data for all languages, comprising both conversational telephone speech and broadcast narrowband speech from multiple sources in each language. Given the potential confusion am...

متن کامل

Multilevel and channel-compensated language recognition: ATVS-UAM systems at NIST LRE 2009

This paper presents the systems submitted by ATVS – Biometric Recognition Group at 2009 language recognition evaluation, organized by the National Institute of Standards and Technology of United States (NIST LRE’09). Apart from the huge size of the databases involved, two main factors turn the evaluation into a very difficult task. First, the number of languages to be recognized was the biggest...

متن کامل

A Study of the Influence of Speech Type on Automatic Language Recognition Performance

Automatic language recognition on spontaneous speech has experienced a rapid development in the last few years. This development has been in part due to the competitive technological Language Recognition Evaluations (LRE) organized by the National Institute of Standards and Technology (NIST). Until now, the need to have clearly defined and consistent evaluations has kept some real-life applicat...

متن کامل

The BLZ Submission to the NIST 2011 LRE: Data Collection, System Development and Performance

This paper describes the most relevant features of a collaborative multi-site submission to the NIST 2011 Language Recognition Evaluation (LRE), consisting of one primary and three contrastive systems, each fusing different combinations of 13 state-of-the-art (acoustic and phonotactic) language recognition subsystems. The collaboration focused on collecting and sharing training data for those t...

متن کامل

Human and Computer Recognition of Regional Accents and Ethnic Groups from British English Speech

T he paralinguistic information in a speech signal includes clues to the geographical and social background of the speaker. This thesis is concerned with automatic extraction of this information from a short segment of speech. A state-of-the-art Language Identification (ID) system, which is obtained by fusing variant of Gaussian mixture model and support vector machines, is developed and evalua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016